Overview

Dataset statistics

Number of variables13
Number of observations1000
Missing cells2078
Missing cells (%)16.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory101.7 KiB
Average record size in memory104.1 B

Variable types

Numeric2
Categorical11

Alerts

countryCode has constant value "GB" Constant
name has a high cardinality: 993 distinct values High cardinality
foundingDate has a high cardinality: 875 distinct values High cardinality
dissolutionDate has a high cardinality: 132 distinct values High cardinality
companiesHouseID has a high cardinality: 1000 distinct values High cardinality
openCorporatesID has a high cardinality: 993 distinct values High cardinality
openOwnershipRegisterID has a high cardinality: 1000 distinct values High cardinality
SICCode_SicText_1 has a high cardinality: 195 distinct values High cardinality
CompanyCategory is highly correlated with countryCodeHigh correlation
Accounts_AccountCategory is highly correlated with countryCodeHigh correlation
CompanyStatus is highly correlated with countryCodeHigh correlation
countryCode is highly correlated with CompanyCategory and 2 other fieldsHigh correlation
dissolutionDate has 749 (74.9%) missing values Missing
CompanyCategory has 327 (32.7%) missing values Missing
CompanyStatus has 327 (32.7%) missing values Missing
Accounts_AccountCategory has 327 (32.7%) missing values Missing
SICCode_SicText_1 has 327 (32.7%) missing values Missing
name is uniformly distributed Uniform
foundingDate is uniformly distributed Uniform
companiesHouseID is uniformly distributed Uniform
openCorporatesID is uniformly distributed Uniform
openOwnershipRegisterID is uniformly distributed Uniform
df_index has unique values Unique
statementID has unique values Unique
companiesHouseID has unique values Unique
openOwnershipRegisterID has unique values Unique

Reproduction

Analysis started2022-06-01 21:23:51.293361
Analysis finished2022-06-01 21:24:10.024138
Duration18.73 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3657.32
Minimum11
Maximum7283
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-06-01T22:24:10.093718image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile400.8
Q11824
median3634
Q35403.75
95-th percentile6834.05
Maximum7283
Range7272
Interquartile range (IQR)3579.75

Descriptive statistics

Standard deviation2077.900089
Coefficient of variation (CV)0.5681482858
Kurtosis-1.203192438
Mean3657.32
Median Absolute Deviation (MAD)1786
Skewness-0.01622520546
Sum3657320
Variance4317668.778
MonotonicityNot monotonic
2022-06-01T22:24:10.199651image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24801
 
0.1%
33161
 
0.1%
65281
 
0.1%
65261
 
0.1%
15721
 
0.1%
26861
 
0.1%
25751
 
0.1%
11801
 
0.1%
21311
 
0.1%
32901
 
0.1%
Other values (990)990
99.0%
ValueCountFrequency (%)
111
0.1%
171
0.1%
201
0.1%
271
0.1%
301
0.1%
411
0.1%
521
0.1%
561
0.1%
601
0.1%
691
0.1%
ValueCountFrequency (%)
72831
0.1%
72731
0.1%
72671
0.1%
72651
0.1%
72421
0.1%
72401
0.1%
72341
0.1%
72271
0.1%
72191
0.1%
72171
0.1%

statementID
Real number (ℝ≥0)

UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.022696483 × 1018
Minimum2.207652262 × 1016
Maximum1.843110538 × 1019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-06-01T22:24:10.283331image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2.207652262 × 1016
5-th percentile8.793039468 × 1017
Q14.53594823 × 1018
median8.68441055 × 1018
Q31.359725972 × 1019
95-th percentile1.75766796 × 1019
Maximum1.843110538 × 1019
Range1.840902886 × 1019
Interquartile range (IQR)9.061311494 × 1018

Descriptive statistics

Standard deviation5.289407333 × 1018
Coefficient of variation (CV)0.5862335436
Kurtosis-1.178589278
Mean9.022696483 × 1018
Median Absolute Deviation (MAD)4.532392026 × 1018
Skewness0.09522403697
Sum9.022696483 × 1021
Variance2.797782993 × 1037
MonotonicityNot monotonic
2022-06-01T22:24:10.359949image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.009810236 × 10181
 
0.1%
5.22541436 × 10181
 
0.1%
1.14865743 × 10191
 
0.1%
1.115993323 × 10191
 
0.1%
1.373399064 × 10181
 
0.1%
6.657484594 × 10181
 
0.1%
1.320294528 × 10191
 
0.1%
1.211590486 × 10191
 
0.1%
1.376978863 × 10191
 
0.1%
1.26145512 × 10191
 
0.1%
Other values (990)990
99.0%
ValueCountFrequency (%)
2.207652262 × 10161
0.1%
6.078762817 × 10161
0.1%
6.984294855 × 10161
0.1%
1.312573226 × 10171
0.1%
1.433841229 × 10171
0.1%
1.481273235 × 10171
0.1%
1.868366202 × 10171
0.1%
2.11249959 × 10171
0.1%
2.164221699 × 10171
0.1%
2.173754228 × 10171
0.1%
ValueCountFrequency (%)
1.843110538 × 10191
0.1%
1.842974351 × 10191
0.1%
1.842214572 × 10191
0.1%
1.841969631 × 10191
0.1%
1.841285706 × 10191
0.1%
1.835112392 × 10191
0.1%
1.834968919 × 10191
0.1%
1.832073919 × 10191
0.1%
1.831679345 × 10191
0.1%
1.823792987 × 10191
0.1%

name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct993
Distinct (%)100.0%
Missing7
Missing (%)0.7%
Memory size7.9 KiB
CIFRA SERVICES LTD
 
1
ARCHAX CAPITAL LTD
 
1
LOWER THE BAR LIMITED
 
1
TOKA EVENTS LTD
 
1
OPTIMISTIC INFLUENCE LTD
 
1
Other values (988)
988 

Length

Max length72
Median length44
Mean length23.28096677
Min length8

Characters and Unicode

Total characters23118
Distinct characters49
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique993 ?
Unique (%)100.0%

Sample

1st rowCIFRA SERVICES LTD
2nd rowBRADFORD ONLINE LIMITED
3rd rowFIRE PROTECTION PRODUCTS LIMITED
4th rowALLSITE CATERING LIMITED
5th rowKNOWLES SYSTEMS LIMITED

Common Values

ValueCountFrequency (%)
CIFRA SERVICES LTD1
 
0.1%
ARCHAX CAPITAL LTD1
 
0.1%
LOWER THE BAR LIMITED1
 
0.1%
TOKA EVENTS LTD1
 
0.1%
OPTIMISTIC INFLUENCE LTD1
 
0.1%
WANDERLUST CLUB LIMITED1
 
0.1%
GOOD NUTS LTD1
 
0.1%
SWAN COMMERCIAL PROPERTIES LIMITED1
 
0.1%
MANOR GARDENS (HOLSWORTHY) MANAGEMENT COMPANY LIMITED1
 
0.1%
CHAKIB RICHANI COLLECTION LTD1
 
0.1%
Other values (983)983
98.3%
(Missing)7
 
0.7%

Length

2022-06-01T22:24:10.447697image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
limited513
 
14.8%
ltd449
 
13.0%
services55
 
1.6%
50
 
1.4%
the34
 
1.0%
uk31
 
0.9%
company25
 
0.7%
and22
 
0.6%
property22
 
0.6%
solutions22
 
0.6%
Other values (1657)2240
64.7%

Most occurring characters

ValueCountFrequency (%)
2470
 
10.7%
I2139
 
9.3%
E2109
 
9.1%
T2058
 
8.9%
L1739
 
7.5%
D1398
 
6.0%
A1237
 
5.4%
S1204
 
5.2%
R1173
 
5.1%
N1168
 
5.1%
Other values (39)6423
27.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter20305
87.8%
Space Separator2470
 
10.7%
Other Punctuation129
 
0.6%
Decimal Number103
 
0.4%
Open Punctuation50
 
0.2%
Close Punctuation50
 
0.2%
Dash Punctuation10
 
< 0.1%
Final Punctuation1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I2139
10.5%
E2109
10.4%
T2058
10.1%
L1739
 
8.6%
D1398
 
6.9%
A1237
 
6.1%
S1204
 
5.9%
R1173
 
5.8%
N1168
 
5.8%
O1102
 
5.4%
Other values (18)4978
24.5%
Decimal Number
ValueCountFrequency (%)
126
25.2%
217
16.5%
412
11.7%
710
 
9.7%
010
 
9.7%
98
 
7.8%
58
 
7.8%
36
 
5.8%
83
 
2.9%
63
 
2.9%
Other Punctuation
ValueCountFrequency (%)
.62
48.1%
&56
43.4%
'8
 
6.2%
@1
 
0.8%
,1
 
0.8%
/1
 
0.8%
Space Separator
ValueCountFrequency (%)
2470
100.0%
Open Punctuation
ValueCountFrequency (%)
(50
100.0%
Close Punctuation
ValueCountFrequency (%)
)50
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin20305
87.8%
Common2813
 
12.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
I2139
10.5%
E2109
10.4%
T2058
10.1%
L1739
 
8.6%
D1398
 
6.9%
A1237
 
6.1%
S1204
 
5.9%
R1173
 
5.8%
N1168
 
5.8%
O1102
 
5.4%
Other values (18)4978
24.5%
Common
ValueCountFrequency (%)
2470
87.8%
.62
 
2.2%
&56
 
2.0%
(50
 
1.8%
)50
 
1.8%
126
 
0.9%
217
 
0.6%
412
 
0.4%
710
 
0.4%
010
 
0.4%
Other values (11)50
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII23115
> 99.9%
None2
 
< 0.1%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2470
 
10.7%
I2139
 
9.3%
E2109
 
9.1%
T2058
 
8.9%
L1739
 
7.5%
D1398
 
6.0%
A1237
 
5.4%
S1204
 
5.2%
R1173
 
5.1%
N1168
 
5.1%
Other values (36)6420
27.8%
None
ValueCountFrequency (%)
É1
50.0%
È1
50.0%
Punctuation
ValueCountFrequency (%)
1
100.0%

foundingDate
Categorical

HIGH CARDINALITY
UNIFORM

Distinct875
Distinct (%)88.1%
Missing7
Missing (%)0.7%
Memory size7.9 KiB
2019-08-09
 
3
2012-09-25
 
3
2021-08-06
 
3
2021-04-26
 
3
2018-04-05
 
3
Other values (870)
978 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters9930
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique769 ?
Unique (%)77.4%

Sample

1st row2021-04-21
2nd row2018-11-12
3rd row2004-08-10
4th row2017-11-14
5th row2007-02-12

Common Values

ValueCountFrequency (%)
2019-08-093
 
0.3%
2012-09-253
 
0.3%
2021-08-063
 
0.3%
2021-04-263
 
0.3%
2018-04-053
 
0.3%
2019-06-173
 
0.3%
2016-07-153
 
0.3%
2014-07-033
 
0.3%
2019-03-233
 
0.3%
2018-09-263
 
0.3%
Other values (865)963
96.3%
(Missing)7
 
0.7%

Length

2022-06-01T22:24:10.517880image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2019-08-093
 
0.3%
2014-07-033
 
0.3%
2012-09-253
 
0.3%
2018-10-033
 
0.3%
2018-09-263
 
0.3%
2019-03-233
 
0.3%
2022-02-243
 
0.3%
2016-07-153
 
0.3%
2019-06-173
 
0.3%
2018-04-053
 
0.3%
Other values (865)963
97.0%

Most occurring characters

ValueCountFrequency (%)
02440
24.6%
-1986
20.0%
21828
18.4%
11583
15.9%
9393
 
4.0%
8311
 
3.1%
6298
 
3.0%
5292
 
2.9%
7283
 
2.8%
3282
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number7944
80.0%
Dash Punctuation1986
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02440
30.7%
21828
23.0%
11583
19.9%
9393
 
4.9%
8311
 
3.9%
6298
 
3.8%
5292
 
3.7%
7283
 
3.6%
3282
 
3.5%
4234
 
2.9%
Dash Punctuation
ValueCountFrequency (%)
-1986
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common9930
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02440
24.6%
-1986
20.0%
21828
18.4%
11583
15.9%
9393
 
4.0%
8311
 
3.1%
6298
 
3.0%
5292
 
2.9%
7283
 
2.8%
3282
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII9930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02440
24.6%
-1986
20.0%
21828
18.4%
11583
15.9%
9393
 
4.0%
8311
 
3.1%
6298
 
3.0%
5292
 
2.9%
7283
 
2.8%
3282
 
2.8%

dissolutionDate
Categorical

HIGH CARDINALITY
MISSING

Distinct132
Distinct (%)52.6%
Missing749
Missing (%)74.9%
Memory size7.9 KiB
2020-10-13
 
8
2018-01-02
 
6
2021-03-23
 
6
2020-03-17
 
6
2021-03-16
 
4
Other values (127)
221 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters2510
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique70 ?
Unique (%)27.9%

Sample

1st row2020-10-20
2nd row2019-04-23
3rd row2021-04-06
4th row2021-05-25
5th row2018-03-20

Common Values

ValueCountFrequency (%)
2020-10-138
 
0.8%
2018-01-026
 
0.6%
2021-03-236
 
0.6%
2020-03-176
 
0.6%
2021-03-164
 
0.4%
2021-01-264
 
0.4%
2020-02-044
 
0.4%
2020-10-204
 
0.4%
2020-12-224
 
0.4%
2021-06-224
 
0.4%
Other values (122)201
 
20.1%
(Missing)749
74.9%

Length

2022-06-01T22:24:10.571924image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-10-138
 
3.2%
2021-03-236
 
2.4%
2020-03-176
 
2.4%
2018-01-026
 
2.4%
2020-12-224
 
1.6%
2017-12-194
 
1.6%
2021-06-224
 
1.6%
2018-02-274
 
1.6%
2020-10-204
 
1.6%
2020-02-044
 
1.6%
Other values (122)201
80.1%

Most occurring characters

ValueCountFrequency (%)
0611
24.3%
2526
21.0%
-502
20.0%
1426
17.0%
998
 
3.9%
896
 
3.8%
375
 
3.0%
764
 
2.5%
645
 
1.8%
439
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2008
80.0%
Dash Punctuation502
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0611
30.4%
2526
26.2%
1426
21.2%
998
 
4.9%
896
 
4.8%
375
 
3.7%
764
 
3.2%
645
 
2.2%
439
 
1.9%
528
 
1.4%
Dash Punctuation
ValueCountFrequency (%)
-502
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2510
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0611
24.3%
2526
21.0%
-502
20.0%
1426
17.0%
998
 
3.9%
896
 
3.8%
375
 
3.0%
764
 
2.5%
645
 
1.8%
439
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII2510
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0611
24.3%
2526
21.0%
-502
20.0%
1426
17.0%
998
 
3.9%
896
 
3.8%
375
 
3.0%
764
 
2.5%
645
 
1.8%
439
 
1.6%

countryCode
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
GB
1000 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGB
2nd rowGB
3rd rowGB
4th rowGB
5th rowGB

Common Values

ValueCountFrequency (%)
GB1000
100.0%

Length

2022-06-01T22:24:10.627123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-01T22:24:10.682435image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
gb1000
100.0%

Most occurring characters

ValueCountFrequency (%)
G1000
50.0%
B1000
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2000
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
G1000
50.0%
B1000
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G1000
50.0%
B1000
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G1000
50.0%
B1000
50.0%

companiesHouseID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
13348512
 
1
06379523
 
1
11806852
 
1
11288817
 
1
13657977
 
1
Other values (995)
995 

Length

Max length8
Median length8
Mean length7.999
Min length7

Characters and Unicode

Total characters7999
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1000 ?
Unique (%)100.0%

Sample

1st row13348512
2nd row11673158
3rd rowNI051395
4th row11062947
5th row06094377

Common Values

ValueCountFrequency (%)
133485121
 
0.1%
063795231
 
0.1%
118068521
 
0.1%
112888171
 
0.1%
136579771
 
0.1%
122969051
 
0.1%
115587551
 
0.1%
112433771
 
0.1%
063695901
 
0.1%
103535711
 
0.1%
Other values (990)990
99.0%

Length

2022-06-01T22:24:10.728430image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
133485121
 
0.1%
030257371
 
0.1%
083041991
 
0.1%
131700641
 
0.1%
ni0513951
 
0.1%
110629471
 
0.1%
060943771
 
0.1%
122247821
 
0.1%
oc4028941
 
0.1%
126741781
 
0.1%
Other values (990)990
99.0%

Most occurring characters

ValueCountFrequency (%)
11285
16.1%
01101
13.8%
2742
9.3%
9742
9.3%
3736
9.2%
7691
8.6%
5665
8.3%
8657
8.2%
6618
7.7%
4602
7.5%
Other values (6)160
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number7839
98.0%
Uppercase Letter160
 
2.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
11285
16.4%
01101
14.0%
2742
9.5%
9742
9.5%
3736
9.4%
7691
8.8%
5665
8.5%
8657
8.4%
6618
7.9%
4602
7.7%
Uppercase Letter
ValueCountFrequency (%)
C70
43.8%
S63
39.4%
O13
 
8.1%
N6
 
3.8%
I6
 
3.8%
L2
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
Common7839
98.0%
Latin160
 
2.0%

Most frequent character per script

Common
ValueCountFrequency (%)
11285
16.4%
01101
14.0%
2742
9.5%
9742
9.5%
3736
9.4%
7691
8.8%
5665
8.5%
8657
8.4%
6618
7.9%
4602
7.7%
Latin
ValueCountFrequency (%)
C70
43.8%
S63
39.4%
O13
 
8.1%
N6
 
3.8%
I6
 
3.8%
L2
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7999
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11285
16.1%
01101
13.8%
2742
9.3%
9742
9.3%
3736
9.2%
7691
8.6%
5665
8.3%
8657
8.2%
6618
7.7%
4602
7.5%
Other values (6)160
 
2.0%

openCorporatesID
Categorical

HIGH CARDINALITY
UNIFORM

Distinct993
Distinct (%)100.0%
Missing7
Missing (%)0.7%
Memory size7.9 KiB
https://opencorporates.com/companies/gb/13348512
 
1
https://opencorporates.com/companies/gb/11477975
 
1
https://opencorporates.com/companies/gb/11288817
 
1
https://opencorporates.com/companies/gb/13657977
 
1
https://opencorporates.com/companies/gb/12296905
 
1
Other values (988)
988 

Length

Max length48
Median length48
Mean length48
Min length48

Characters and Unicode

Total characters47664
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique993 ?
Unique (%)100.0%

Sample

1st rowhttps://opencorporates.com/companies/gb/13348512
2nd rowhttps://opencorporates.com/companies/gb/11673158
3rd rowhttps://opencorporates.com/companies/gb/NI051395
4th rowhttps://opencorporates.com/companies/gb/11062947
5th rowhttps://opencorporates.com/companies/gb/06094377

Common Values

ValueCountFrequency (%)
https://opencorporates.com/companies/gb/133485121
 
0.1%
https://opencorporates.com/companies/gb/114779751
 
0.1%
https://opencorporates.com/companies/gb/112888171
 
0.1%
https://opencorporates.com/companies/gb/136579771
 
0.1%
https://opencorporates.com/companies/gb/122969051
 
0.1%
https://opencorporates.com/companies/gb/115587551
 
0.1%
https://opencorporates.com/companies/gb/112433771
 
0.1%
https://opencorporates.com/companies/gb/063695901
 
0.1%
https://opencorporates.com/companies/gb/103535711
 
0.1%
https://opencorporates.com/companies/gb/074126671
 
0.1%
Other values (983)983
98.3%
(Missing)7
 
0.7%

Length

2022-06-01T22:24:10.784140image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
https://opencorporates.com/companies/gb/133485121
 
0.1%
https://opencorporates.com/companies/gb/037678801
 
0.1%
https://opencorporates.com/companies/gb/131700641
 
0.1%
https://opencorporates.com/companies/gb/ni0513951
 
0.1%
https://opencorporates.com/companies/gb/110629471
 
0.1%
https://opencorporates.com/companies/gb/060943771
 
0.1%
https://opencorporates.com/companies/gb/122247821
 
0.1%
https://opencorporates.com/companies/gb/oc4028941
 
0.1%
https://opencorporates.com/companies/gb/126741781
 
0.1%
https://opencorporates.com/companies/gb/084470181
 
0.1%
Other values (983)983
99.0%

Most occurring characters

ValueCountFrequency (%)
/4965
 
10.4%
o4965
 
10.4%
p3972
 
8.3%
s2979
 
6.2%
e2979
 
6.2%
c2979
 
6.2%
t2979
 
6.2%
n1986
 
4.2%
r1986
 
4.2%
a1986
 
4.2%
Other values (23)15888
33.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter32769
68.8%
Decimal Number7784
 
16.3%
Other Punctuation6951
 
14.6%
Uppercase Letter160
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o4965
15.2%
p3972
12.1%
s2979
9.1%
e2979
9.1%
c2979
9.1%
t2979
9.1%
n1986
 
6.1%
r1986
 
6.1%
a1986
 
6.1%
m1986
 
6.1%
Other values (4)3972
12.1%
Decimal Number
ValueCountFrequency (%)
11273
16.4%
01100
14.1%
9739
9.5%
2738
9.5%
3725
9.3%
7687
8.8%
8654
8.4%
5654
8.4%
6616
7.9%
4598
7.7%
Uppercase Letter
ValueCountFrequency (%)
C70
43.8%
S63
39.4%
O13
 
8.1%
N6
 
3.8%
I6
 
3.8%
L2
 
1.2%
Other Punctuation
ValueCountFrequency (%)
/4965
71.4%
.993
 
14.3%
:993
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin32929
69.1%
Common14735
30.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
o4965
15.1%
p3972
12.1%
s2979
9.0%
e2979
9.0%
c2979
9.0%
t2979
9.0%
n1986
 
6.0%
r1986
 
6.0%
a1986
 
6.0%
m1986
 
6.0%
Other values (10)4132
12.5%
Common
ValueCountFrequency (%)
/4965
33.7%
11273
 
8.6%
01100
 
7.5%
.993
 
6.7%
:993
 
6.7%
9739
 
5.0%
2738
 
5.0%
3725
 
4.9%
7687
 
4.7%
8654
 
4.4%
Other values (3)1868
 
12.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII47664
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/4965
 
10.4%
o4965
 
10.4%
p3972
 
8.3%
s2979
 
6.2%
e2979
 
6.2%
c2979
 
6.2%
t2979
 
6.2%
n1986
 
4.2%
r1986
 
4.2%
a1986
 
4.2%
Other values (23)15888
33.3%

openOwnershipRegisterID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
http://register.openownership.org/entities/60999516e4e645384b151f9f
 
1
http://register.openownership.org/entities/59b944a967e4ebf3408c185d
 
1
http://register.openownership.org/entities/5c5bdddd9dfc3fae182d2bf7
 
1
http://register.openownership.org/entities/5b166c5a9dfc3fae185f7481
 
1
http://register.openownership.org/entities/6169714a5a8495d725b5f0d6
 
1
Other values (995)
995 

Length

Max length67
Median length67
Mean length67
Min length67

Characters and Unicode

Total characters67000
Distinct characters29
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1000 ?
Unique (%)100.0%

Sample

1st rowhttp://register.openownership.org/entities/60999516e4e645384b151f9f
2nd rowhttp://register.openownership.org/entities/5bf44c979dfc3fae18c57ab0
3rd rowhttp://register.openownership.org/entities/59b93d3767e4ebf3406ca5bb
4th rowhttp://register.openownership.org/entities/5aba261d9dfc3fae1894f164
5th rowhttp://register.openownership.org/entities/59c500f167e4ebf3405d9caa

Common Values

ValueCountFrequency (%)
http://register.openownership.org/entities/60999516e4e645384b151f9f1
 
0.1%
http://register.openownership.org/entities/59b944a967e4ebf3408c185d1
 
0.1%
http://register.openownership.org/entities/5c5bdddd9dfc3fae182d2bf71
 
0.1%
http://register.openownership.org/entities/5b166c5a9dfc3fae185f74811
 
0.1%
http://register.openownership.org/entities/6169714a5a8495d725b5f0d61
 
0.1%
http://register.openownership.org/entities/5dc2c4149dfc3fae18cbd17c1
 
0.1%
http://register.openownership.org/entities/5bbb801a9dfc3fae18511b6b1
 
0.1%
http://register.openownership.org/entities/5ab97d849dfc3fae18ee06c41
 
0.1%
http://register.openownership.org/entities/59b942d767e4ebf34085154d1
 
0.1%
http://register.openownership.org/entities/59b936ac67e4ebf3404e8d371
 
0.1%
Other values (990)990
99.0%

Length

2022-06-01T22:24:10.839811image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
http://register.openownership.org/entities/60999516e4e645384b151f9f1
 
0.1%
http://register.openownership.org/entities/59b9b73e67e4ebf34078b91b1
 
0.1%
http://register.openownership.org/entities/59b9622e67e4ebf34014d5b51
 
0.1%
http://register.openownership.org/entities/604ba888fcb54a9955eb134d1
 
0.1%
http://register.openownership.org/entities/59b93d3767e4ebf3406ca5bb1
 
0.1%
http://register.openownership.org/entities/5aba261d9dfc3fae1894f1641
 
0.1%
http://register.openownership.org/entities/59c500f167e4ebf3405d9caa1
 
0.1%
http://register.openownership.org/entities/5d8b7ca89dfc3fae180350e51
 
0.1%
http://register.openownership.org/entities/59b9635567e4ebf3401aad9b1
 
0.1%
http://register.openownership.org/entities/5f1f11ec84281976229cd3061
 
0.1%
Other values (990)990
99.0%

Most occurring characters

ValueCountFrequency (%)
e8082
 
12.1%
t5000
 
7.5%
i4000
 
6.0%
/4000
 
6.0%
r4000
 
6.0%
s3000
 
4.5%
p3000
 
4.5%
n3000
 
4.5%
o3000
 
4.5%
92156
 
3.2%
Other values (19)27762
41.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter45340
67.7%
Decimal Number14660
 
21.9%
Other Punctuation7000
 
10.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e8082
17.8%
t5000
11.0%
i4000
8.8%
r4000
8.8%
s3000
 
6.6%
p3000
 
6.6%
n3000
 
6.6%
o3000
 
6.6%
b2063
 
4.6%
g2000
 
4.4%
Other values (6)8195
18.1%
Decimal Number
ValueCountFrequency (%)
92156
14.7%
42070
14.1%
51715
11.7%
61545
10.5%
31526
10.4%
71389
9.5%
01338
9.1%
11050
7.2%
8990
6.8%
2881
6.0%
Other Punctuation
ValueCountFrequency (%)
/4000
57.1%
.2000
28.6%
:1000
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin45340
67.7%
Common21660
32.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e8082
17.8%
t5000
11.0%
i4000
8.8%
r4000
8.8%
s3000
 
6.6%
p3000
 
6.6%
n3000
 
6.6%
o3000
 
6.6%
b2063
 
4.6%
g2000
 
4.4%
Other values (6)8195
18.1%
Common
ValueCountFrequency (%)
/4000
18.5%
92156
10.0%
42070
9.6%
.2000
9.2%
51715
7.9%
61545
 
7.1%
31526
 
7.0%
71389
 
6.4%
01338
 
6.2%
11050
 
4.8%
Other values (3)2871
13.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII67000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e8082
 
12.1%
t5000
 
7.5%
i4000
 
6.0%
/4000
 
6.0%
r4000
 
6.0%
s3000
 
4.5%
p3000
 
4.5%
n3000
 
4.5%
o3000
 
4.5%
92156
 
3.2%
Other values (19)27762
41.4%

CompanyCategory
Categorical

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)1.0%
Missing327
Missing (%)32.7%
Memory size7.9 KiB
Private Limited Company
630 
PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital)
 
24
Limited Liability Partnership
 
8
PRI/LBG/NSC (Private, Limited by guarantee, no share capital, use of 'Limited' exemption)
 
4
Community Interest Company
 
4
Other values (2)
 
3

Length

Max length89
Median length23
Mean length25.11292719
Min length19

Characters and Unicode

Total characters16901
Distinct characters41
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowPrivate Limited Company
2nd rowPrivate Limited Company
3rd rowPrivate Limited Company
4th rowLimited Liability Partnership
5th rowPrivate Limited Company

Common Values

ValueCountFrequency (%)
Private Limited Company630
63.0%
PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital)24
 
2.4%
Limited Liability Partnership8
 
0.8%
PRI/LBG/NSC (Private, Limited by guarantee, no share capital, use of 'Limited' exemption)4
 
0.4%
Community Interest Company4
 
0.4%
Limited Partnership2
 
0.2%
Private Unlimited Company1
 
0.1%
(Missing)327
32.7%

Length

2022-06-01T22:24:10.899419image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-01T22:24:10.970083image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
limited672
30.3%
private659
29.7%
company635
28.6%
by52
 
2.3%
no28
 
1.3%
share28
 
1.3%
capital28
 
1.3%
guarantee28
 
1.3%
guar/nsc24
 
1.1%
pri/ltd24
 
1.1%
Other values (9)43
 
1.9%

Most occurring characters

ValueCountFrequency (%)
i2075
12.3%
1548
 
9.2%
a1452
 
8.6%
e1446
 
8.6%
t1422
 
8.4%
m1320
 
7.8%
r739
 
4.4%
n714
 
4.2%
P697
 
4.1%
L684
 
4.0%
Other values (31)4804
28.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter12808
75.8%
Uppercase Letter2365
 
14.0%
Space Separator1548
 
9.2%
Other Punctuation124
 
0.7%
Open Punctuation28
 
0.2%
Close Punctuation28
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i2075
16.2%
a1452
11.3%
e1446
11.3%
t1422
11.1%
m1320
10.3%
r739
 
5.8%
n714
 
5.6%
p677
 
5.3%
y675
 
5.3%
o675
 
5.3%
Other values (11)1613
12.6%
Uppercase Letter
ValueCountFrequency (%)
P697
29.5%
L684
28.9%
C667
28.2%
R52
 
2.2%
I32
 
1.4%
G28
 
1.2%
N28
 
1.2%
S28
 
1.2%
B28
 
1.2%
U25
 
1.1%
Other values (4)96
 
4.1%
Other Punctuation
ValueCountFrequency (%)
,60
48.4%
/56
45.2%
'8
 
6.5%
Space Separator
ValueCountFrequency (%)
1548
100.0%
Open Punctuation
ValueCountFrequency (%)
(28
100.0%
Close Punctuation
ValueCountFrequency (%)
)28
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin15173
89.8%
Common1728
 
10.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
i2075
13.7%
a1452
 
9.6%
e1446
 
9.5%
t1422
 
9.4%
m1320
 
8.7%
r739
 
4.9%
n714
 
4.7%
P697
 
4.6%
L684
 
4.5%
p677
 
4.5%
Other values (25)3947
26.0%
Common
ValueCountFrequency (%)
1548
89.6%
,60
 
3.5%
/56
 
3.2%
(28
 
1.6%
)28
 
1.6%
'8
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII16901
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i2075
12.3%
1548
 
9.2%
a1452
 
8.6%
e1446
 
8.6%
t1422
 
8.4%
m1320
 
7.8%
r739
 
4.4%
n714
 
4.2%
P697
 
4.1%
L684
 
4.0%
Other values (31)4804
28.4%

CompanyStatus
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.4%
Missing327
Missing (%)32.7%
Memory size7.9 KiB
Active
639 
Active - Proposal to Strike off
 
29
Liquidation
 
5

Length

Max length31
Median length6
Mean length7.114413076
Min length6

Characters and Unicode

Total characters4788
Distinct characters23
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowActive
2nd rowActive
3rd rowActive
4th rowActive - Proposal to Strike off
5th rowActive

Common Values

ValueCountFrequency (%)
Active639
63.9%
Active - Proposal to Strike off29
 
2.9%
Liquidation5
 
0.5%
(Missing)327
32.7%

Length

2022-06-01T22:24:11.039853image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-01T22:24:11.263045image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
active668
81.7%
29
 
3.5%
proposal29
 
3.5%
to29
 
3.5%
strike29
 
3.5%
off29
 
3.5%
liquidation5
 
0.6%

Most occurring characters

ValueCountFrequency (%)
t731
15.3%
i712
14.9%
e697
14.6%
A668
14.0%
v668
14.0%
c668
14.0%
145
 
3.0%
o121
 
2.5%
f58
 
1.2%
r58
 
1.2%
Other values (13)262
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3883
81.1%
Uppercase Letter731
 
15.3%
Space Separator145
 
3.0%
Dash Punctuation29
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t731
18.8%
i712
18.3%
e697
18.0%
v668
17.2%
c668
17.2%
o121
 
3.1%
f58
 
1.5%
r58
 
1.5%
a34
 
0.9%
l29
 
0.7%
Other values (7)107
 
2.8%
Uppercase Letter
ValueCountFrequency (%)
A668
91.4%
S29
 
4.0%
P29
 
4.0%
L5
 
0.7%
Space Separator
ValueCountFrequency (%)
145
100.0%
Dash Punctuation
ValueCountFrequency (%)
-29
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4614
96.4%
Common174
 
3.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
t731
15.8%
i712
15.4%
e697
15.1%
A668
14.5%
v668
14.5%
c668
14.5%
o121
 
2.6%
f58
 
1.3%
r58
 
1.3%
a34
 
0.7%
Other values (11)199
 
4.3%
Common
ValueCountFrequency (%)
145
83.3%
-29
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4788
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t731
15.3%
i712
14.9%
e697
14.6%
A668
14.0%
v668
14.0%
c668
14.0%
145
 
3.0%
o121
 
2.5%
f58
 
1.2%
r58
 
1.2%
Other values (13)262
 
5.5%

Accounts_AccountCategory
Categorical

HIGH CORRELATION
MISSING

Distinct10
Distinct (%)1.5%
Missing327
Missing (%)32.7%
Memory size7.9 KiB
MICRO ENTITY
205 
TOTAL EXEMPTION FULL
175 
NO ACCOUNTS FILED
168 
DORMANT
77 
UNAUDITED ABRIDGED
22 
Other values (5)
26 

Length

Max length21
Median length20
Mean length14.70430906
Min length4

Characters and Unicode

Total characters9896
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.3%

Sample

1st rowNO ACCOUNTS FILED
2nd rowMICRO ENTITY
3rd rowMICRO ENTITY
4th rowTOTAL EXEMPTION FULL
5th rowMICRO ENTITY

Common Values

ValueCountFrequency (%)
MICRO ENTITY205
20.5%
TOTAL EXEMPTION FULL175
17.5%
NO ACCOUNTS FILED168
16.8%
DORMANT77
 
7.7%
UNAUDITED ABRIDGED22
 
2.2%
FULL12
 
1.2%
SMALL8
 
0.8%
GROUP4
 
0.4%
TOTAL EXEMPTION SMALL1
 
0.1%
AUDITED ABRIDGED1
 
0.1%
(Missing)327
32.7%

Length

2022-06-01T22:24:11.319528image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-01T22:24:11.394772image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
micro205
12.9%
entity205
12.9%
full187
11.8%
total176
11.1%
exemption176
11.1%
no168
10.6%
accounts168
10.6%
filed168
10.6%
dormant77
 
4.8%
abridged23
 
1.4%
Other values (4)36
 
2.3%

Most occurring characters

ValueCountFrequency (%)
T1206
12.2%
O974
9.8%
916
9.3%
N816
 
8.2%
I800
 
8.1%
E771
 
7.8%
L736
 
7.4%
C541
 
5.5%
A476
 
4.8%
M467
 
4.7%
Other values (10)2193
22.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter8980
90.7%
Space Separator916
 
9.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T1206
13.4%
O974
10.8%
N816
9.1%
I800
8.9%
E771
8.6%
L736
8.2%
C541
 
6.0%
A476
 
5.3%
M467
 
5.2%
U404
 
4.5%
Other values (9)1789
19.9%
Space Separator
ValueCountFrequency (%)
916
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8980
90.7%
Common916
 
9.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
T1206
13.4%
O974
10.8%
N816
9.1%
I800
8.9%
E771
8.6%
L736
8.2%
C541
 
6.0%
A476
 
5.3%
M467
 
5.2%
U404
 
4.5%
Other values (9)1789
19.9%
Common
ValueCountFrequency (%)
916
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T1206
12.2%
O974
9.8%
916
9.3%
N816
 
8.2%
I800
 
8.1%
E771
 
7.8%
L736
 
7.4%
C541
 
5.5%
A476
 
4.8%
M467
 
4.7%
Other values (10)2193
22.2%

SICCode_SicText_1
Categorical

HIGH CARDINALITY
MISSING

Distinct195
Distinct (%)29.0%
Missing327
Missing (%)32.7%
Memory size7.9 KiB
82990 - Other business support service activities n.e.c.
 
50
68209 - Other letting and operating of own or leased real estate
 
28
70229 - Management consultancy activities other than financial management
 
24
68100 - Buying and selling of own real estate
 
19
96090 - Other service activities n.e.c.
 
17
Other values (190)
535 

Length

Max length107
Median length74
Mean length48.10698366
Min length13

Characters and Unicode

Total characters32376
Distinct characters66
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)14.3%

Sample

1st row96090 - Other service activities n.e.c.
2nd row43999 - Other specialised construction activities n.e.c.
3rd row82990 - Other business support service activities n.e.c.
4th rowNone Supplied
5th row64209 - Activities of other holding companies n.e.c.

Common Values

ValueCountFrequency (%)
82990 - Other business support service activities n.e.c.50
 
5.0%
68209 - Other letting and operating of own or leased real estate28
 
2.8%
70229 - Management consultancy activities other than financial management24
 
2.4%
68100 - Buying and selling of own real estate19
 
1.9%
96090 - Other service activities n.e.c.17
 
1.7%
47910 - Retail sale via mail order houses or via Internet17
 
1.7%
99999 - Dormant Company17
 
1.7%
41100 - Development of building projects15
 
1.5%
86900 - Other human health activities13
 
1.3%
43999 - Other specialised construction activities n.e.c.13
 
1.3%
Other values (185)460
46.0%
(Missing)327
32.7%

Length

2022-06-01T22:24:11.482228image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
663
 
13.4%
other272
 
5.5%
activities267
 
5.4%
of200
 
4.1%
and197
 
4.0%
n.e.c114
 
2.3%
service80
 
1.6%
management76
 
1.5%
or67
 
1.4%
sale66
 
1.3%
Other values (578)2931
59.4%

Most occurring characters

ValueCountFrequency (%)
4260
 
13.2%
e2828
 
8.7%
i2423
 
7.5%
t2292
 
7.1%
a1984
 
6.1%
n1941
 
6.0%
s1706
 
5.3%
o1480
 
4.6%
r1409
 
4.4%
c1232
 
3.8%
Other values (56)10821
33.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter22958
70.9%
Space Separator4260
 
13.2%
Decimal Number3315
 
10.2%
Uppercase Letter716
 
2.2%
Dash Punctuation706
 
2.2%
Other Punctuation417
 
1.3%
Close Punctuation2
 
< 0.1%
Open Punctuation2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2828
12.3%
i2423
10.6%
t2292
10.0%
a1984
8.6%
n1941
8.5%
s1706
 
7.4%
o1480
 
6.4%
r1409
 
6.1%
c1232
 
5.4%
l996
 
4.3%
Other values (16)4667
20.3%
Uppercase Letter
ValueCountFrequency (%)
O216
30.2%
R68
 
9.5%
M62
 
8.7%
C44
 
6.1%
A38
 
5.3%
D37
 
5.2%
S36
 
5.0%
B31
 
4.3%
F28
 
3.9%
I27
 
3.8%
Other values (12)129
18.0%
Decimal Number
ValueCountFrequency (%)
0791
23.9%
9614
18.5%
2437
13.2%
1346
10.4%
4265
 
8.0%
6234
 
7.1%
8215
 
6.5%
7164
 
4.9%
3148
 
4.5%
5101
 
3.0%
Other Punctuation
ValueCountFrequency (%)
.343
82.3%
,72
 
17.3%
;1
 
0.2%
'1
 
0.2%
Space Separator
ValueCountFrequency (%)
4260
100.0%
Dash Punctuation
ValueCountFrequency (%)
-706
100.0%
Close Punctuation
ValueCountFrequency (%)
)2
100.0%
Open Punctuation
ValueCountFrequency (%)
(2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin23674
73.1%
Common8702
 
26.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2828
11.9%
i2423
10.2%
t2292
9.7%
a1984
 
8.4%
n1941
 
8.2%
s1706
 
7.2%
o1480
 
6.3%
r1409
 
6.0%
c1232
 
5.2%
l996
 
4.2%
Other values (38)5383
22.7%
Common
ValueCountFrequency (%)
4260
49.0%
0791
 
9.1%
-706
 
8.1%
9614
 
7.1%
2437
 
5.0%
1346
 
4.0%
.343
 
3.9%
4265
 
3.0%
6234
 
2.7%
8215
 
2.5%
Other values (8)491
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII32376
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4260
 
13.2%
e2828
 
8.7%
i2423
 
7.5%
t2292
 
7.1%
a1984
 
6.1%
n1941
 
6.0%
s1706
 
5.3%
o1480
 
4.6%
r1409
 
4.4%
c1232
 
3.8%
Other values (56)10821
33.4%

Interactions

2022-06-01T22:23:58.617265image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:23:52.785510image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:24:01.169538image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:23:52.895627image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-01T22:24:11.542804image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-01T22:24:11.597926image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-01T22:24:11.651362image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-01T22:24:11.707676image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-06-01T22:24:11.771989image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-01T22:24:09.615601image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-01T22:24:09.766007image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-06-01T22:24:09.872763image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-06-01T22:24:09.957645image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexstatementIDnamefoundingDatedissolutionDatecountryCodecompaniesHouseIDopenCorporatesIDopenOwnershipRegisterIDCompanyCategoryCompanyStatusAccounts_AccountCategorySICCode_SicText_1
024803009810236444562249CIFRA SERVICES LTD2021-04-21NoneGB13348512https://opencorporates.com/companies/gb/13348512http://register.openownership.org/entities/60999516e4e645384b151f9fPrivate Limited CompanyActiveNO ACCOUNTS FILED96090 - Other service activities n.e.c.
1524611031123992823807139BRADFORD ONLINE LIMITED2018-11-122020-10-20GB11673158https://opencorporates.com/companies/gb/11673158http://register.openownership.org/entities/5bf44c979dfc3fae18c57ab0NoneNoneNoneNone
2523016034895648864300689FIRE PROTECTION PRODUCTS LIMITED2004-08-10NoneGBNI051395https://opencorporates.com/companies/gb/NI051395http://register.openownership.org/entities/59b93d3767e4ebf3406ca5bbPrivate Limited CompanyActiveMICRO ENTITY43999 - Other specialised construction activities n.e.c.
3607813605112502038953621ALLSITE CATERING LIMITED2017-11-142019-04-23GB11062947https://opencorporates.com/companies/gb/11062947http://register.openownership.org/entities/5aba261d9dfc3fae1894f164NoneNoneNoneNone
463075333408146780574678KNOWLES SYSTEMS LIMITED2007-02-12NoneGB06094377https://opencorporates.com/companies/gb/06094377http://register.openownership.org/entities/59c500f167e4ebf3405d9caaPrivate Limited CompanyActiveMICRO ENTITY82990 - Other business support service activities n.e.c.
5373715458955880793802661TRADE PLUS INVESTMENT LTD2019-09-242021-04-06GB12224782https://opencorporates.com/companies/gb/12224782http://register.openownership.org/entities/5d8b7ca89dfc3fae180350e5NoneNoneNoneNone
630525664068093023364540MV MIRJAM LLP2015-11-16NoneGBOC402894https://opencorporates.com/companies/gb/OC402894http://register.openownership.org/entities/59b9635567e4ebf3401aad9bLimited Liability PartnershipActive - Proposal to Strike offTOTAL EXEMPTION FULLNone Supplied
72716024097053705099358LOMAX LIFESTYLE LTD2020-06-16NoneGB12674178https://opencorporates.com/companies/gb/12674178http://register.openownership.org/entities/5f1f11ec84281976229cd306NoneNoneNoneNone
812945659990438031123930AYAN INVESTMENTS LTD2013-03-15NoneGB08447018https://opencorporates.com/companies/gb/08447018http://register.openownership.org/entities/59b957da67e4ebf340e4bcdcPrivate Limited CompanyActiveMICRO ENTITY64209 - Activities of other holding companies n.e.c.
968179362310756853578640J & L SYSONBY HOLDINGS LIMITED2016-09-05NoneGB10358652https://opencorporates.com/companies/gb/10358652http://register.openownership.org/entities/59b9e40467e4ebf340201c0fPrivate Limited CompanyActiveTOTAL EXEMPTION FULL47190 - Other retail sale in non-specialised stores

Last rows

df_indexstatementIDnamefoundingDatedissolutionDatecountryCodecompaniesHouseIDopenCorporatesIDopenOwnershipRegisterIDCompanyCategoryCompanyStatusAccounts_AccountCategorySICCode_SicText_1
99040544845625195067271126INSPIRE FITNESS 121 LTD2019-02-01NoneGB11802688https://opencorporates.com/companies/gb/11802688http://register.openownership.org/entities/5c5bd9b29dfc3fae1825f457Private Limited CompanyActiveTOTAL EXEMPTION FULL96040 - Physical well-being activities
99150943309642346195000690MONKEY GRINDER LIMITED2017-06-01NoneGB10797692https://opencorporates.com/companies/gb/10797692http://register.openownership.org/entities/59b9c79267e4ebf340b6bc16Private Limited CompanyActive - Proposal to Strike offDORMANT56302 - Public houses and bars
99223516275264334036127378ALL PAWS AND CLAWS DOG SUPPLIES LTD2021-08-26NoneGB13587509https://opencorporates.com/companies/gb/13587509http://register.openownership.org/entities/61532e8843b30a9a879722d9Private Limited CompanyActiveNO ACCOUNTS FILED47760 - Retail sale of flowers, plants, seeds, fertilizers, pet animals and pet food in specialised stores
993409713549231433332385833SOUTHWELL HOMES LIMITED2004-07-19NoneGB05182817https://opencorporates.com/companies/gb/05182817http://register.openownership.org/entities/59b9250367e4ebf340fc27e6Private Limited CompanyActiveTOTAL EXEMPTION FULL41201 - Construction of commercial buildings
99443671044224546179366862RAKSPRING & TRAFFIC JAM RECORDS LTD2019-09-26NoneGB12227934https://opencorporates.com/companies/gb/12227934http://register.openownership.org/entities/5da754c69dfc3fae1823a03aPrivate Limited CompanyActiveDORMANT59200 - Sound recording and music publishing activities
995127712513068543715449624DIGITAL FASHION LTD2021-08-26NoneGB13587352https://opencorporates.com/companies/gb/13587352http://register.openownership.org/entities/61532e7b43b30a9a87970b69Private Limited CompanyActiveNO ACCOUNTS FILED47410 - Retail sale of computers, peripheral units and software in specialised stores
996209817967827072728966909SMG INTERIORS LIMITED2014-08-19NoneGB09181419https://opencorporates.com/companies/gb/09181419http://register.openownership.org/entities/59b96f8e67e4ebf34056e34aPrivate Limited CompanyActiveMICRO ENTITY43290 - Other construction installation
99759034398747519184758591JUNIPERUS VENTURES LIMITED2019-08-222021-03-30GB12170017https://opencorporates.com/companies/gb/12170017http://register.openownership.org/entities/5d721e039dfc3fae18dbaf66NoneNoneNoneNone
998588218422145717058358860JBCYBERPRODUCTS LTD2022-04-01NoneGB14018076https://opencorporates.com/companies/gb/14018076http://register.openownership.org/entities/6257428f0421b9f92d0fe9f6Private Limited CompanyActiveNO ACCOUNTS FILED47910 - Retail sale via mail order houses or via Internet
999263217955519609785945237THE MARKETING HOUSE LTD2022-01-25NoneGB13870291https://opencorporates.com/companies/gb/13870291http://register.openownership.org/entities/6216c61add90051720175496Private Limited CompanyActiveNO ACCOUNTS FILED70229 - Management consultancy activities other than financial management